Capstone Project

Applied Data Science Capstone by IBM/Coursera

Introduction: Business Problem

In the capstone project I will try to find a good location for a Indian restaurant in Manhattan. Specifically, this report will be targeted to stakeholders interested in opening an Indian restaurant in Manhattan, NY.

Since there are 2874 restaurants in Manhattan I will try to find locations

  • that are not already crowded with restaurants.
  • where there are as few Indian restaurants as possible in the closer area around.
  • Where the share of Indian restaurants in the neighborhood is very little.
  • which are as close to the center of Manhattan as possible.

With Data Science I will try to find and present to the stakholders the most promissing neigborhoods of Manhattan where to open up a Indian restaurant.

Data

Based on the definition of the Business Problem, the decsission will be influenced by the following factors:

  • total number of existing restaurants in the neighborhood.
  • total number of Indian restaurants in the neighborhood.
  • share of Indian restaurants in the neighborhood.
  • distance to the next Indian restaurant, if there are any.
  • distance from city center.

To find the most promissing neighborhoods to open up a Indian restaurant in Manhattan I will use the following data sources:

In [3]:
#!conda install -c conda-forge folium --yes
#!conda install -c conda-forge geopy --yes
import numpy as np
import pandas as pd
import json
from geopy.geocoders import Nominatim
import requests
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium
import pyproj
import math

print('All nessecary Libraries imported!')
All nessecary Libraries imported!

Load New York dataset about neigborhoods

In [4]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')
Data downloaded!
In [5]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)
In [6]:
neighborhoods_data = newyork_data['features']
In [7]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)
In [8]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

Filter the dataframe for neighborhoods of Manhattan

In [9]:
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data
Out[9]:
Borough Neighborhood Latitude Longitude
0 Manhattan Marble Hill 40.876551 -73.910660
1 Manhattan Chinatown 40.715618 -73.994279
2 Manhattan Washington Heights 40.851903 -73.936900
3 Manhattan Inwood 40.867684 -73.921210
4 Manhattan Hamilton Heights 40.823604 -73.949688
5 Manhattan Manhattanville 40.816934 -73.957385
6 Manhattan Central Harlem 40.815976 -73.943211
7 Manhattan East Harlem 40.792249 -73.944182
8 Manhattan Upper East Side 40.775639 -73.960508
9 Manhattan Yorkville 40.775930 -73.947118
10 Manhattan Lenox Hill 40.768113 -73.958860
11 Manhattan Roosevelt Island 40.762160 -73.949168
12 Manhattan Upper West Side 40.787658 -73.977059
13 Manhattan Lincoln Square 40.773529 -73.985338
14 Manhattan Clinton 40.759101 -73.996119
15 Manhattan Midtown 40.754691 -73.981669
16 Manhattan Murray Hill 40.748303 -73.978332
17 Manhattan Chelsea 40.744035 -74.003116
18 Manhattan Greenwich Village 40.726933 -73.999914
19 Manhattan East Village 40.727847 -73.982226
20 Manhattan Lower East Side 40.717807 -73.980890
21 Manhattan Tribeca 40.721522 -74.010683
22 Manhattan Little Italy 40.719324 -73.997305
23 Manhattan Soho 40.722184 -74.000657
24 Manhattan West Village 40.734434 -74.006180
25 Manhattan Manhattan Valley 40.797307 -73.964286
26 Manhattan Morningside Heights 40.808000 -73.963896
27 Manhattan Gramercy 40.737210 -73.981376
28 Manhattan Battery Park City 40.711932 -74.016869
29 Manhattan Financial District 40.707107 -74.010665
30 Manhattan Carnegie Hill 40.782683 -73.953256
31 Manhattan Noho 40.723259 -73.988434
32 Manhattan Civic Center 40.715229 -74.005415
33 Manhattan Midtown South 40.748510 -73.988713
34 Manhattan Sutton Place 40.760280 -73.963556
35 Manhattan Turtle Bay 40.752042 -73.967708
36 Manhattan Tudor City 40.746917 -73.971219
37 Manhattan Stuyvesant Town 40.731000 -73.974052
38 Manhattan Flatiron 40.739673 -73.990947
39 Manhattan Hudson Yards 40.756658 -74.000111

get latitude an longitude of manhattan with geopy

In [10]:
address = 'Manhattan, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))
The geograpical coordinate of Manhattan are 40.7900869, -73.9598295.

create a folium map of New York and mark all Manhattan neigborhoods and the center of Manhattan in it!

In [11]:
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=13)
folium.CircleMarker([latitude, longitude], radius=7, color='orange', fill=True, fill_color='orange', fill_opacity=1).add_to(map_manhattan)  
folium.Circle(location=[latitude, longitude], radius=1000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=3000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=5000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=7000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=9000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=11000, fill=False, color='white').add_to(map_manhattan)
# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)   

map_manhattan
Out[11]:

Define dataframe with all neighborhoods, latitude, longitude, distance to center of Manhattan, x, y

In [12]:
def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)
In [13]:
#calculate distances from center
distance_from_center=[]
X=[]
Y=[]

manhatten_longitude= longitude
manhatten_latitude=latitude
manhatten_x, manhatten_y= lonlat_to_xy(manhatten_longitude,manhatten_latitude)

for i in range(len(manhattan_data)):
    neigborhood_x, neigborhood_y= lonlat_to_xy(manhattan_data['Longitude'][i],manhattan_data['Latitude'][i])
    distance_from_center.append(calc_xy_distance(manhatten_x, manhatten_y, neigborhood_x, neigborhood_y)) 
    X.append(neigborhood_x)
    Y.append(neigborhood_y)
In [14]:
manhattan_data = manhattan_data.drop('Borough', 1)
manhattan_data['X']=X
manhattan_data['Y']=Y
manhattan_data['Distance from Center']=distance_from_center
manhattan_data
Out[14]:
Neighborhood Latitude Longitude X Y Distance from Center
0 Marble Hill 40.876551 -73.910660 -5.794205e+06 9.858099e+06 15945.318731
1 Chinatown 40.715618 -73.994279 -5.821760e+06 9.868103e+06 13386.331413
2 Washington Heights 40.851903 -73.936900 -5.798470e+06 9.861349e+06 10875.558655
3 Inwood 40.867684 -73.921210 -5.795743e+06 9.859410e+06 14045.714502
4 Hamilton Heights 40.823604 -73.949688 -5.803305e+06 9.862859e+06 5825.579136
5 Manhattanville 40.816934 -73.957385 -5.804461e+06 9.863817e+06 4558.839318
6 Central Harlem 40.815976 -73.943211 -5.804573e+06 9.861989e+06 4879.562566
7 East Harlem 40.792249 -73.944182 -5.808594e+06 9.862002e+06 2048.077645
8 Upper East Side 40.775639 -73.960508 -5.811466e+06 9.864025e+06 2450.104944
9 Yorkville 40.775930 -73.947118 -5.811369e+06 9.862302e+06 2904.700389
10 Lenox Hill 40.768113 -73.958860 -5.812735e+06 9.863778e+06 3726.329791
11 Roosevelt Island 40.762160 -73.949168 -5.813710e+06 9.862501e+06 4928.750788
12 Upper West Side 40.787658 -73.977059 -5.809488e+06 9.866213e+06 2256.859133
13 Lincoln Square 40.773529 -73.985338 -5.811911e+06 9.867214e+06 4321.155481
14 Clinton 40.759101 -73.996119 -5.814393e+06 9.868537e+06 7032.091172
15 Midtown 40.754691 -73.981669 -5.815091e+06 9.866655e+06 6627.031194
16 Murray Hill 40.748303 -73.978332 -5.816162e+06 9.866195e+06 7473.643188
17 Chelsea 40.744035 -74.003116 -5.816971e+06 9.869371e+06 9595.602241
18 Greenwich Village 40.726933 -73.999914 -5.819860e+06 9.868881e+06 11889.803368
19 East Village 40.727847 -73.982226 -5.819644e+06 9.866603e+06 10940.735481
20 Lower East Side 40.717807 -73.980890 -5.821342e+06 9.866385e+06 12553.599818
21 Tribeca 40.721522 -74.010683 -5.820815e+06 9.870247e+06 13347.615944
22 Little Italy 40.719324 -73.997305 -5.821142e+06 9.868510e+06 12935.416973
23 Soho 40.722184 -74.000657 -5.820668e+06 9.868956e+06 12660.015165
24 West Village 40.734434 -74.006180 -5.818610e+06 9.869723e+06 11168.174308
25 Manhattan Valley 40.797307 -73.964286 -5.807809e+06 9.864613e+06 1351.248231
26 Morningside Heights 40.808000 -73.963896 -5.805997e+06 9.864613e+06 3079.540301
27 Gramercy 40.737210 -73.981376 -5.818053e+06 9.866537e+06 9384.882201
28 Battery Park City 40.711932 -74.016869 -5.822463e+06 9.871002e+06 15157.882372
29 Financial District 40.707107 -74.010665 -5.823260e+06 9.870180e+06 15524.620042
30 Carnegie Hill 40.782683 -73.953256 -5.810247e+06 9.863125e+06 1513.625317
31 Noho 40.723259 -73.988434 -5.820444e+06 9.867383e+06 11916.304016
32 Civic Center 40.715229 -74.005415 -5.821864e+06 9.869539e+06 13988.910694
33 Midtown South 40.748510 -73.988713 -5.816163e+06 9.867535e+06 7970.635769
34 Sutton Place 40.760280 -73.963556 -5.814080e+06 9.864346e+06 5074.836562
35 Turtle Bay 40.752042 -73.967708 -5.815491e+06 9.864843e+06 6528.332861
36 Tudor City 40.746917 -73.971219 -5.816372e+06 9.865272e+06 7463.773133
37 Stuyvesant Town 40.731000 -73.974052 -5.819081e+06 9.865563e+06 10184.343643
38 Flatiron 40.739673 -73.990947 -5.817669e+06 9.867782e+06 9441.086152
39 Hudson Yards 40.756658 -74.000111 -5.814821e+06 9.869041e+06 7684.413616

Insert all Foursquare credetials

In [15]:
#hidden cell
CLIENT_ID = 'ZZFPNPGKMCMTFXJ03VWM5VB10NGEHUYYFQP3OSKHSAMU5SAS' # your Foursquare ID
CLIENT_SECRET = 'WBXMGAKRY11BE2F5K0RV5VQWGZRNGLIMPPG1XKFJYSNXVCYF' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

Query all restaurants and all indian restaurants for each neighborhood from Foursquare API

  • food_category = '4d4b7105d754a06374d81259'
  • indian_restaurant='4bf58dd8d48988d10f941735'
In [16]:
def getNearbyVenues(names, latitudes, longitudes, category, radius=500, LIMIT=200):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            category,
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)
In [17]:
manhattan_restaurants = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude'],category='4d4b7105d754a06374d81259', radius=500, LIMIT=200
                                  )
Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards
In [18]:
manhattan_indian_restaurants = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude'],category='4bf58dd8d48988d10f941735', radius=500, LIMIT=200
                                  )
Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards
In [19]:
print(manhattan_restaurants.shape)
manhattan_restaurants.head(20)
(2891, 7)
Out[19]:
Neighborhood Neighborhood Latitude Neighborhood Longitude Venue Venue Latitude Venue Longitude Venue Category
0 Marble Hill 40.876551 -73.910660 Arturo's 40.874412 -73.910271 Pizza Place
1 Marble Hill 40.876551 -73.910660 Tibbett Diner 40.880404 -73.908937 Diner
2 Marble Hill 40.876551 -73.910660 Dunkin' 40.877136 -73.906666 Donut Shop
3 Marble Hill 40.876551 -73.910660 Land & Sea Restaurant 40.877885 -73.905873 Seafood Restaurant
4 Marble Hill 40.876551 -73.910660 Parrilla Latina 40.877473 -73.906073 Steakhouse
5 Marble Hill 40.876551 -73.910660 Subway Sandwiches 40.874667 -73.909586 Sandwich Place
6 Marble Hill 40.876551 -73.910660 Boston Market 40.877430 -73.905412 American Restaurant
7 Marble Hill 40.876551 -73.910660 SUBWAY 40.878493 -73.905385 Sandwich Place
8 Marble Hill 40.876551 -73.910660 Subway 40.877720 -73.905380 Sandwich Place
9 Marble Hill 40.876551 -73.910660 Hernandez Grocery 40.875897 -73.912591 Deli / Bodega
10 Marble Hill 40.876551 -73.910660 Terrace View Delicatessen 40.876476 -73.912746 Deli / Bodega
11 Marble Hill 40.876551 -73.910660 Rosarina Bakery (aka Franco Bakery) 40.874870 -73.909398 Bakery
12 Marble Hill 40.876551 -73.910660 Applebee's Grill + Bar 40.873685 -73.908928 American Restaurant
13 Marble Hill 40.876551 -73.910660 Pick Up Six: Asian Kitchen 40.878075 -73.907033 Asian Restaurant
14 Marble Hill 40.876551 -73.910660 Subway Sandwiches 40.878270 -73.905308 Sandwich Place
15 Chinatown 40.715618 -73.994279 Kiki's 40.714476 -73.992036 Greek Restaurant
16 Chinatown 40.715618 -73.994279 Spicy Village 40.717010 -73.993530 Chinese Restaurant
17 Chinatown 40.715618 -73.994279 The Fat Radish 40.715323 -73.991950 English Restaurant
18 Chinatown 40.715618 -73.994279 Scarr's Pizza 40.715335 -73.991649 Pizza Place
19 Chinatown 40.715618 -73.994279 Cheeky Sandwiches 40.715707 -73.991508 Sandwich Place
In [20]:
print(manhattan_indian_restaurants.shape)
manhattan_indian_restaurants.head(20)
(274, 7)
Out[20]:
Neighborhood Neighborhood Latitude Neighborhood Longitude Venue Venue Latitude Venue Longitude Venue Category
0 Chinatown 40.715618 -73.994279 Nyonya 40.719155 -73.996893 Malay Restaurant
1 Chinatown 40.715618 -73.994279 New Malaysia 40.715787 -73.996905 Malay Restaurant
2 Chinatown 40.715618 -73.994279 Dirt Candy 40.717890 -73.991015 Vegetarian / Vegan Restaurant
3 Chinatown 40.715618 -73.994279 Sanuria Restaurant 40.714681 -73.998006 Malay Restaurant
4 Chinatown 40.715618 -73.994279 Curry House Indian Cuisine 40.719046 -73.990849 Indian Restaurant
5 Chinatown 40.715618 -73.994279 Roasting Plant Coffee 40.717784 -73.990453 Coffee Shop
6 Washington Heights 40.851903 -73.936900 Kismat Indian Restaurant 40.855222 -73.936967 Indian Restaurant
7 Hamilton Heights 40.823604 -73.949688 Clove Indian Restaurant & Bar 40.821280 -73.950620 Indian Restaurant
8 Hamilton Heights 40.823604 -73.949688 Mumbai Masala 40.826866 -73.946486 Indian Restaurant
9 Manhattanville 40.816934 -73.957385 Chapati House - NYC 40.814572 -73.959154 Indian Restaurant
10 Manhattanville 40.816934 -73.957385 Simply Indian 40.814467 -73.959276 Indian Restaurant
11 Central Harlem 40.815976 -73.943211 Strength West Indian Restaurant 40.817880 -73.941921 Indian Restaurant
12 East Harlem 40.792249 -73.944182 SPICEHUT INDIAN RESTAURANT 40.794198 -73.939837 Indian Restaurant
13 East Harlem 40.792249 -73.944182 Indo Pak Halal Restaurant 40.794219 -73.939665 Indian Restaurant
14 East Harlem 40.792249 -73.944182 Glenn's Pizza Deli 40.790259 -73.947465 Pizza Place
15 Upper East Side 40.775639 -73.960508 Tandoor Oven 40.777140 -73.955696 Indian Restaurant
16 Upper East Side 40.775639 -73.960508 Candle Cafe 40.771407 -73.959138 Vegetarian / Vegan Restaurant
17 Upper East Side 40.775639 -73.960508 Flex Mussels 40.776337 -73.956430 Seafood Restaurant
18 Yorkville 40.775930 -73.947118 Mumtaz 40.774134 -73.948227 Indian Restaurant
19 Yorkville 40.775930 -73.947118 Tamarind East 40.776337 -73.952328 Indian Restaurant
In [21]:
print('Total number of restaurants in Manhattan:', len(manhattan_restaurants))
print('Total number of Indian restaurants in Manhatten:', len(manhattan_indian_restaurants))
print('Percentage of Indian restaurants in Mahattan: {:.2f}%'.format(len(manhattan_indian_restaurants) / len(manhattan_restaurants) * 100))
Total number of restaurants in Manhattan: 2891
Total number of Indian restaurants in Manhatten: 274
Percentage of Indian restaurants in Mahattan: 9.48%

Create a folium map to display all restaurants in Manhatten and show them in different colors. Indian restauants in green and other restauratns in red and the center of Manhattan in orange

In [76]:
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=13)
folium.CircleMarker([latitude, longitude], radius=7, color='orange', fill=True, fill_color='orange', fill_opacity=1).add_to(map_manhattan)  
folium.Circle(location=[latitude, longitude], radius=1000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=3000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=5000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=7000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=9000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=11000, fill=False, color='white').add_to(map_manhattan)
  
for lat, lng in zip(manhattan_restaurants['Venue Latitude'], manhattan_restaurants['Venue Longitude']):
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        color='red',
        fill=True,
        fill_color='red',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
for lat, lng in zip(manhattan_indian_restaurants['Venue Latitude'], manhattan_indian_restaurants['Venue Longitude']):
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        color='green',
        fill=True,
        fill_color='green',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  

map_manhattan
Out[76]:

Now we developed a feeling for the data.
We have gathered all the information we need to do our further analysis.

  • We know all neighborhoods and their center location
  • We know all restaurants of Manhattan and their location
  • We know all Indian restaurants and there location
  • We can visualize all locations and types of restaurants in Manhattan

This concludes the Data preparation phase and now we can continue with the analysis of the data to find the most promising neighborhoods.

Methodology

The goal of this project is to detect the most promising areas of Manhattan where to open up a Indian restaurant.

In the first step I want so see if I can identify some areas in Manhattan with low density of restaurants/Indian restaurants that are as close as possible to the center of Manhattan.

Therefore I calculate additional figures for each neighborhood to get a better understanding of the data:

  • Number of restaurants in every neighborhood
  • Number of Indian restaurants in every neighborhood
  • Percentage of Indian restaurants in every neighborhood
  • Distance from the center of a neighborhood to the next Indian restaurant

Then I will use heatmaps to visualize:

  • the density of restaurants
  • the density of Indian restaurants

and choropleth maps to visualize:

  • the percentage of Indian restaurants in a neighborhood
  • the distance from the center of a neighborhood to the next Indian restaurant

In the second step I will use the identified areas and generate a grid of cells for those areas.
For every grid cell I will calculate some figures in order to define how good the location is and to be able to filter them to get a map of all the areas that are promising to open up a Indian restaurant.
For each grid cell the following figures will be calculated:

  • Latitude
  • Longitude
  • Nearby restaurants
  • Distance to next indian restaurant
  • Distance to center of Manhattan

Then the generated dataframe of all grid cells will be filtered for grid cell where:

  • the next Indian restaurant is more than 500m away
  • and there are no restaurants within an radius of 250m

In the final step I will generate a heatmap to visualize the filtered list of grid cells which represent a map of all the promising locations to open up a Indian restaurant in Manhattan.

Analysis

Lets start the analysis with identify some areas in Manhattan with low density of restaurants/Indian restaurants that are as close as possible to the center of Manhattan therefore lets derive some additional data from our prepared dataset.

First we need the number of restaurants and the number of Indian restaurants in every neighborhood.

In [23]:
#get the total number of restaurants in each neighborhood 
restaurants_count=manhattan_restaurants['Neighborhood'].value_counts()
restaurants_count = pd.DataFrame([restaurants_count])
restaurants_count=restaurants_count.transpose().reset_index()
restaurants_count.columns =['Neighborhood','Count']

#get the total number of Indian restaurants in each neighborhood 
indian_restaurants_count=manhattan_indian_restaurants['Neighborhood'].value_counts()
indian_restaurants_count = pd.DataFrame([indian_restaurants_count])
indian_restaurants_count=indian_restaurants_count.transpose().reset_index()
indian_restaurants_count.columns =['Neighborhood','Count']

restaurants_count.head()
indian_restaurants_count.head()
Out[23]:
Neighborhood Count
0 Noho 29
1 East Village 21
2 Midtown 21
3 Greenwich Village 19
4 Midtown South 18
In [24]:
manhattan_data.head()
Out[24]:
Neighborhood Latitude Longitude X Y Distance from Center
0 Marble Hill 40.876551 -73.910660 -5.794205e+06 9.858099e+06 15945.318731
1 Chinatown 40.715618 -73.994279 -5.821760e+06 9.868103e+06 13386.331413
2 Washington Heights 40.851903 -73.936900 -5.798470e+06 9.861349e+06 10875.558655
3 Inwood 40.867684 -73.921210 -5.795743e+06 9.859410e+06 14045.714502
4 Hamilton Heights 40.823604 -73.949688 -5.803305e+06 9.862859e+06 5825.579136
In [25]:
manhattan_data_v2=manhattan_data
In [26]:
manhattan_data_v2['Number of Restaurants']=manhattan_data_v2.Neighborhood.map(restaurants_count.set_index('Neighborhood')['Count'].to_dict())
manhattan_data_v2['Number of Indian Restaurants']=manhattan_data_v2.Neighborhood.map(indian_restaurants_count.set_index('Neighborhood')['Count'].to_dict())
manhattan_data_v2['Number of Indian Restaurants'].fillna(0, inplace=True)
manhattan_data_v2.head()
Out[26]:
Neighborhood Latitude Longitude X Y Distance from Center Number of Restaurants Number of Indian Restaurants
0 Marble Hill 40.876551 -73.910660 -5.794205e+06 9.858099e+06 15945.318731 15 0.0
1 Chinatown 40.715618 -73.994279 -5.821760e+06 9.868103e+06 13386.331413 100 6.0
2 Washington Heights 40.851903 -73.936900 -5.798470e+06 9.861349e+06 10875.558655 74 1.0
3 Inwood 40.867684 -73.921210 -5.795743e+06 9.859410e+06 14045.714502 52 0.0
4 Hamilton Heights 40.823604 -73.949688 -5.803305e+06 9.862859e+06 5825.579136 62 2.0

Next we calculate the percentage of Indian restaurants in each neighborhood.

In [27]:
Percentage=[]
for i in range(len(manhattan_data_v2['Neighborhood'].unique())):
    Percentage.append(round(manhattan_data_v2['Number of Indian Restaurants'][i]/manhattan_data_v2['Number of Restaurants'][i],2))

manhattan_data_v2['Percentage of Indian Restaurants']=Percentage
manhattan_data_v2.head()
Out[27]:
Neighborhood Latitude Longitude X Y Distance from Center Number of Restaurants Number of Indian Restaurants Percentage of Indian Restaurants
0 Marble Hill 40.876551 -73.910660 -5.794205e+06 9.858099e+06 15945.318731 15 0.0 0.00
1 Chinatown 40.715618 -73.994279 -5.821760e+06 9.868103e+06 13386.331413 100 6.0 0.06
2 Washington Heights 40.851903 -73.936900 -5.798470e+06 9.861349e+06 10875.558655 74 1.0 0.01
3 Inwood 40.867684 -73.921210 -5.795743e+06 9.859410e+06 14045.714502 52 0.0 0.00
4 Hamilton Heights 40.823604 -73.949688 -5.803305e+06 9.862859e+06 5825.579136 62 2.0 0.03

Now we calculate the distance of the center of a neighborhood to the next Indian restaurant.

In [28]:
Distances=[]
for i in range(len(manhattan_data_v2['Neighborhood'].unique())):
    shortest_distance=None
    
    latitude_neighborhood=manhattan_data_v2['Latitude'][i]
    longitude_neighborhood=manhattan_data_v2['Longitude'][i]
    #calculate x, y of neighborhood
    x_neigh, y_neigh=lonlat_to_xy(longitude_neighborhood,latitude_neighborhood)
    
    for s in range(manhattan_indian_restaurants.shape[0]):
        latitude_restaurant=manhattan_indian_restaurants['Venue Latitude'][s]
        longitude_restaurant=manhattan_indian_restaurants['Venue Longitude'][s]
        
        #calculate x, y of Indian restaurant
        x_rest, y_rest=lonlat_to_xy(longitude_restaurant,latitude_restaurant)
        
        #calculate distance.
        dist = calc_xy_distance(x_neigh, y_neigh, x_rest, y_rest)
        if shortest_distance==None:
            shortest_distance=dist
        elif dist<shortest_distance:
            shortest_distance=dist
    Distances.append(round(shortest_distance,2))

manhattan_data_v2['Distance to Indian Restaurants from Center']=Distances
manhattan_data_v2
Out[28]:
Neighborhood Latitude Longitude X Y Distance from Center Number of Restaurants Number of Indian Restaurants Percentage of Indian Restaurants Distance to Indian Restaurants from Center
0 Marble Hill 40.876551 -73.910660 -5.794205e+06 9.858099e+06 15945.318731 15 0.0 0.00 4943.62
1 Chinatown 40.715618 -73.994279 -5.821760e+06 9.868103e+06 13386.331413 100 6.0 0.06 340.24
2 Washington Heights 40.851903 -73.936900 -5.798470e+06 9.861349e+06 10875.558655 74 1.0 0.01 561.78
3 Inwood 40.867684 -73.921210 -5.795743e+06 9.859410e+06 14045.714502 52 0.0 0.00 2922.91
4 Hamilton Heights 40.823604 -73.949688 -5.803305e+06 9.862859e+06 5825.579136 62 2.0 0.03 411.44
5 Manhattanville 40.816934 -73.957385 -5.804461e+06 9.863817e+06 4558.839318 41 2.0 0.05 460.20
6 Central Harlem 40.815976 -73.943211 -5.804573e+06 9.861989e+06 4879.562566 46 1.0 0.02 362.68
7 East Harlem 40.792249 -73.944182 -5.808594e+06 9.862002e+06 2048.077645 53 3.0 0.06 540.80
8 Upper East Side 40.775639 -73.960508 -5.811466e+06 9.864025e+06 2450.104944 80 3.0 0.04 538.49
9 Yorkville 40.775930 -73.947118 -5.811369e+06 9.862302e+06 2904.700389 91 3.0 0.03 336.31
10 Lenox Hill 40.768113 -73.958860 -5.812735e+06 9.863778e+06 3726.329791 100 7.0 0.07 210.68
11 Roosevelt Island 40.762160 -73.949168 -5.813710e+06 9.862501e+06 4928.750788 14 0.0 0.00 1359.14
12 Upper West Side 40.787658 -73.977059 -5.809488e+06 9.866213e+06 2256.859133 61 9.0 0.15 42.63
13 Lincoln Square 40.773529 -73.985338 -5.811911e+06 9.867214e+06 4321.155481 53 1.0 0.02 756.07
14 Clinton 40.759101 -73.996119 -5.814393e+06 9.868537e+06 7032.091172 100 6.0 0.06 488.68
15 Midtown 40.754691 -73.981669 -5.815091e+06 9.866655e+06 6627.031194 100 21.0 0.21 388.89
16 Murray Hill 40.748303 -73.978332 -5.816162e+06 9.866195e+06 7473.643188 100 10.0 0.10 317.74
17 Chelsea 40.744035 -74.003116 -5.816971e+06 9.869371e+06 9595.602241 100 4.0 0.04 156.95
18 Greenwich Village 40.726933 -73.999914 -5.819860e+06 9.868881e+06 11889.803368 100 19.0 0.19 201.38
19 East Village 40.727847 -73.982226 -5.819644e+06 9.866603e+06 10940.735481 100 21.0 0.21 159.42
20 Lower East Side 40.717807 -73.980890 -5.821342e+06 9.866385e+06 12553.599818 45 3.0 0.07 655.05
21 Tribeca 40.721522 -74.010683 -5.820815e+06 9.870247e+06 13347.615944 64 2.0 0.03 466.43
22 Little Italy 40.719324 -73.997305 -5.821142e+06 9.868510e+06 12935.416973 100 6.0 0.06 60.39
23 Soho 40.722184 -74.000657 -5.820668e+06 9.868956e+06 12660.015165 100 9.0 0.09 373.64
24 West Village 40.734434 -74.006180 -5.818610e+06 9.869723e+06 11168.174308 100 5.0 0.05 487.80
25 Manhattan Valley 40.797307 -73.964286 -5.807809e+06 9.864613e+06 1351.248231 44 11.0 0.25 204.77
26 Morningside Heights 40.808000 -73.963896 -5.805997e+06 9.864613e+06 3079.540301 39 2.0 0.05 73.85
27 Gramercy 40.737210 -73.981376 -5.818053e+06 9.866537e+06 9384.882201 52 9.0 0.17 467.45
28 Battery Park City 40.711932 -74.016869 -5.822463e+06 9.871002e+06 15157.882372 36 2.0 0.06 598.26
29 Financial District 40.707107 -74.010665 -5.823260e+06 9.870180e+06 15524.620042 100 12.0 0.12 248.91
30 Carnegie Hill 40.782683 -73.953256 -5.810247e+06 9.863125e+06 1513.625317 68 4.0 0.06 315.33
31 Noho 40.723259 -73.988434 -5.820444e+06 9.867383e+06 11916.304016 100 29.0 0.29 44.37
32 Civic Center 40.715229 -74.005415 -5.821864e+06 9.869539e+06 13988.910694 88 11.0 0.12 254.99
33 Midtown South 40.748510 -73.988713 -5.816163e+06 9.867535e+06 7970.635769 100 18.0 0.18 249.89
34 Sutton Place 40.760280 -73.963556 -5.814080e+06 9.864346e+06 5074.836562 78 12.0 0.15 214.48
35 Turtle Bay 40.752042 -73.967708 -5.815491e+06 9.864843e+06 6528.332861 91 6.0 0.07 368.32
36 Tudor City 40.746917 -73.971219 -5.816372e+06 9.865272e+06 7463.773133 85 1.0 0.01 750.39
37 Stuyvesant Town 40.731000 -73.974052 -5.819081e+06 9.865563e+06 10184.343643 6 0.0 0.00 924.24
38 Flatiron 40.739673 -73.990947 -5.817669e+06 9.867782e+06 9441.086152 100 10.0 0.10 188.51
39 Hudson Yards 40.756658 -74.000111 -5.814821e+06 9.869041e+06 7684.413616 53 3.0 0.06 684.15
In [29]:
print('On average the distance from the center of a neighborhod to the closest Indian restaurant is: ', manhattan_data_v2['Distance to Indian Restaurants from Center'].mean())
On average the distance from the center of a neighborhod to the closest Indian restaurant is:  578.28175

Heatmaps to visualize the desitiy of restaurants/Indian restaurants

Lets visualize the density of restaurants in Manhattan with a heatmap.
Red means higher density.
The blue dots represent the center of each neighborhood.

In [55]:
from folium import plugins
from folium.plugins import HeatMap
In [56]:
manhattan_neighborhoods_url = 'https://raw.githubusercontent.com/ibuilder/NYCPolyline/master/manhattan.geojson'
manhattan_neighborhoods = requests.get(manhattan_neighborhoods_url).json()

def boroughs_style(feature):
    return { 'color': 'blue', 'fill': False }
In [57]:
restaurant_latlons=manhattan_restaurants['Venue Latitude'].to_frame()
restaurant_latlons['Venue Longitude']=manhattan_restaurants['Venue Longitude']
In [58]:
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=13)
folium.CircleMarker([latitude, longitude], radius=7, color='orange', fill=True, fill_color='orange', fill_opacity=1).add_to(map_manhattan)  
folium.Circle(location=[latitude, longitude], radius=1000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=3000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=5000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=7000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=9000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=11000, fill=False, color='white').add_to(map_manhattan)
HeatMap(restaurant_latlons).add_to(map_manhattan)
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
folium.GeoJson(manhattan_neighborhoods, style_function=boroughs_style, name='geojson').add_to(map_manhattan)
map_manhattan
Out[58]:

Now lets visualize the density of Indian restaurants in Manhattan with a heatmap.
Red meand higher density.
The blue dots represent the center of each neighborhood.

In [59]:
indian_restaurant_latlons=manhattan_indian_restaurants['Venue Latitude'].to_frame()
indian_restaurant_latlons['Venue Longitude']=manhattan_indian_restaurants['Venue Longitude']
In [60]:
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=13)
folium.CircleMarker([latitude, longitude], radius=7, color='orange', fill=True, fill_color='orange', fill_opacity=1).add_to(map_manhattan)  
folium.Circle(location=[latitude, longitude], radius=1000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=3000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=5000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=7000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=9000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=11000, fill=False, color='white').add_to(map_manhattan)
HeatMap(indian_restaurant_latlons).add_to(map_manhattan)
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
folium.GeoJson(manhattan_neighborhoods, style_function=boroughs_style, name='geojson').add_to(map_manhattan)
map_manhattan
Out[60]:

Now lets visualize both heatmaps together to see if we can spot areas near the center of Manhattan with low density of restaurants an low density of Indian restaurants.

In [78]:
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=13)
folium.CircleMarker([latitude, longitude], radius=7, color='orange', fill=True, fill_color='orange', fill_opacity=1).add_to(map_manhattan)  
folium.Circle(location=[latitude, longitude], radius=1000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=3000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=5000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=7000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=9000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=11000, fill=False, color='white').add_to(map_manhattan)
HeatMap(restaurant_latlons).add_to(map_manhattan)
HeatMap(indian_restaurant_latlons).add_to(map_manhattan)
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
folium.GeoJson(manhattan_neighborhoods, style_function=boroughs_style, name='geojson').add_to(map_manhattan)
map_manhattan
Out[78]:

Insights from desnsity maps:

When having a look on the density of restaurants/Indian restaurants in Manhattan we can see that there are a few spaces with low density close to the centre of Manhattan.

In the close area around the center of Manhattan:

  • Bigger area north/north-east of central park
  • Small area west of the center of central park
  • Bigger area south/ south-west of central park

A bit more away:

  • A bigger area south of central park in between West Village and East Village

So as we can see the areas wich have a overall low density of restaurants are matching the areas with a low density of Indian restaurants quiete well in the closer area around the center of Manhattan. We can see as well that the Heatmap of Indian restaurants is not that hot in general. With a overall share of round about 10 % the share of Indian restaurants is not that high in Manhattan.


Unfortuanatly we can see that the Neighborhoods in the New_York_Dataset and the Geojson file are not matching perfectly.
Some Neighborhoods are named differently of put together in the geojson file.

Choropleth Maps to visualize the shares of Indian restaurants and the distance from center of a neighborhood to the next Indian restaurant

Now lets visualize the share of Indian restaurants in Manhattan with a choropleth map.
The share is colorcoded starting with a low share in Yellow increasing to a higher share in Red.
The blue dots represent the center of each neighborhood.

In [62]:
newyork_geo = r'https://raw.githubusercontent.com/ibuilder/NYCPolyline/master/manhattan.geojson'

map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=13)
folium.CircleMarker([latitude, longitude], radius=7, color='orange', fill=True, fill_color='orange', fill_opacity=1).add_to(map_manhattan)  
folium.Circle(location=[latitude, longitude], radius=1000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=3000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=5000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=7000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=9000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=11000, fill=False, color='white').add_to(map_manhattan)
map_manhattan.choropleth(
    geo_data=newyork_geo,
    data=manhattan_data_v2,
    columns=['Neighborhood', 'Percentage of Indian Restaurants'],
    key_on='feature.properties.neighborhood',
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Percentage of Indian Restaurants'
)
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
map_manhattan
Out[62]:

Unfortuanatly the geojson file and the new_york_dataset doesnt match perfectly.
The naming of the neighborhoods is sometimes slitley different and the centers of neighborhoods sometimes doesnt match the geojson file.

eg.

hamilton heights, manhattenville, central harlem = Harlem in geojson
central park doesn exist in new_york_dataset
Hudson Yards, Clinton = Hells Kitchen, Theater District in geojson ...

Now lets visualize the distance from the center of a Neigborhood to the next Indian restaurant in Manhattan with a choropleth map.
The share is colorcoded starting with a low share in Yellow increasing to a higher share in Red.
The blue dots represent the center of each neighborhood.

In [63]:
newyork_geo = r'https://raw.githubusercontent.com/ibuilder/NYCPolyline/master/manhattan.geojson'

map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=13)
folium.CircleMarker([latitude, longitude], radius=7, color='orange', fill=True, fill_color='orange', fill_opacity=1).add_to(map_manhattan)  
folium.Circle(location=[latitude, longitude], radius=1000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=3000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=5000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=7000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=9000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=11000, fill=False, color='white').add_to(map_manhattan)
map_manhattan.choropleth(
    geo_data=newyork_geo,
    data=manhattan_data_v2,
    columns=['Neighborhood', 'Distance to Indian Restaurants from Center'],
    key_on='feature.properties.neighborhood',
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Percentage of Indian Restaurants'
)
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
map_manhattan
Out[63]:

Insights from choropleth maps:

Unfortunately we can´t take that much information out of the choropleth maps cause the areas we identified in the heatmaps are exactly the ones that are named differently in the geojson file. So especaly for those areas we cant see any information in the choropleth maps.

Lets focus on the following tho areas to generate a grid of cells to evaluate each location in more detail.

In [64]:
center_manhattan=[latitude, longitude]
focus_area1=[40.762849, -73.980685]
focus_area2=[40.802038, -73.948810]

map_manhattan = folium.Map(location=center_manhattan, zoom_start=13)
HeatMap(restaurant_latlons).add_to(map_manhattan)
HeatMap(indian_restaurant_latlons).add_to(map_manhattan)
#folium.GeoJson(manhattan_neighborhoods, style_function=boroughs_style, name='geojson').add_to(map_manhattan)
folium.Marker(center_manhattan).add_to(map_manhattan)
folium.Circle(focus_area1, radius=1500, color='white', fill=True, fill_opacity=0.4).add_to(map_manhattan)
folium.Circle(focus_area2, radius=1100, color='white', fill=True, fill_opacity=0.4).add_to(map_manhattan)
map_manhattan
Out[64]:

Lets proceed with the second step of our analysis.
Lets define define a grid of cells that cover the areas we identified before.

In [66]:
# define focus areas
focus_area1=[40.762849, -73.980685]
#focus_area2=[40.80451, -73.946072]
#focus_area2=[40.803429, -73.947583]
focus_area2=[40.802038, -73.948810]


# define area 1 
lat1_min=focus_area1[0]-0.008
lon1_min=focus_area1[1]-0.014
lat1_max=focus_area1[0]+0.008
lon1_max=focus_area1[1]+0.014
# define area 2
lat2_min=focus_area2[0]-0.008*1300/1500
lon2_min=focus_area2[1]-0.014*1300/1500
lat2_max=focus_area2[0]+0.008*1300/1500
lon2_max=focus_area2[1]+0.014*1300/1500

#corner points of area 1
point1=[lat1_min,lon1_min]
point2=[lat1_max,lon1_min]
point3=[lat1_max,lon1_max]
point4=[lat1_min,lon1_max]
#corner points of area 2
point5=[lat2_min,lon2_min]
point6=[lat2_max,lon2_min]
point7=[lat2_max,lon2_max]
point8=[lat2_min,lon2_max]

#define lists for latitudes and longitudes 
focus_area_latitudes=[]
focus_area_longitudes=[]

#define a grid of points in area1
stepwith=0.0012
steps1_lat=int(round((lat1_max-lat1_min)/stepwith,0))
steps1_lon=int(round((lon1_max-lon1_min)/stepwith,0))

long=lon1_min
for i in range(steps1_lon):
    long=long+stepwith
    lati=lat1_min
    for s in range(steps1_lat):
        lati=lati+stepwith
        focus_area_latitudes.append(lati)
        focus_area_longitudes.append(long)
        
#define a grid of points in area2
steps2_lat=int(round((lat2_max-lat2_min)/stepwith,0))
steps2_lon=int(round((lon2_max-lon2_min)/stepwith,0))

long=lon2_min
for i in range(steps2_lon):
    long=long+stepwith
    lati=lat2_min
    for s in range(steps2_lat):
        lati=lati+stepwith
        focus_area_latitudes.append(lati)
        focus_area_longitudes.append(long)
        
print(str(len(focus_area_latitudes))+" grid points generated!")
539 grid points generated!
In [77]:
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=13)
folium.Marker(center_manhattan).add_to(map_manhattan)
for lat, lng, in zip(focus_area_latitudes, focus_area_longitudes):
    folium.CircleMarker(
        [lat, lng],
        radius=1,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
HeatMap(restaurant_latlons).add_to(map_manhattan)
HeatMap(indian_restaurant_latlons).add_to(map_manhattan)
folium.GeoJson(manhattan_neighborhoods, style_function=boroughs_style, name='geojson').add_to(map_manhattan)
map_manhattan
Out[77]:

Looks great. The grids cover most of the free space nearby the center of Manhattan where there is a low density of restaurants and Indian restaurants as well.
Now lets build a dataframe of all those points and calculate all the important figures for them:

  • Latitude
  • Longitude
  • Nearby restaurants
  • Distance to next Indian restaurant
  • Distance to center of Manhattan
In [68]:
restaurants_nearby=[]
distance_next_indian=[]
distance_center_manhattan=[]

for i in range(len(focus_area_latitudes)): #539

    #calculate x,y of grid point
    x_grid, y_grid = lonlat_to_xy(focus_area_longitudes[i], focus_area_latitudes[i])
    count=0
    shortest_distance=None
    
    #calculate number of restaurants in area of 250m around
    for s in range(len(restaurant_latlons)):
        x_restaurant, y_restaurant = lonlat_to_xy(restaurant_latlons['Venue Longitude'][s], restaurant_latlons['Venue Latitude'][s])
        distance=calc_xy_distance(x_grid, y_grid, x_restaurant, y_restaurant)
        if distance<250:
            count=count+1
    restaurants_nearby.append(count)
            
    #calculate distance to next Indian restaurant
    for k in range(len(indian_restaurant_latlons)):
        x_restaurant, y_restaurant = lonlat_to_xy(indian_restaurant_latlons['Venue Longitude'][k], indian_restaurant_latlons['Venue Latitude'][k])
        dist=calc_xy_distance(x_grid, y_grid, x_restaurant, y_restaurant)
        if shortest_distance==None:
            shortest_distance=dist
        elif dist<shortest_distance:
            shortest_distance=dist
    distance_next_indian.append(round(shortest_distance,0))
    
    #calculate distance to center of manhattan
    x_center_manhattan, y_center_manhattan = lonlat_to_xy(center_manhattan[1], center_manhattan[0])
    dist=calc_xy_distance(x_grid, y_grid, x_center_manhattan, y_center_manhattan)
    distance_center_manhattan.append(round(dist,0))
In [69]:
grid_df=pd.DataFrame({'Latitude':focus_area_latitudes,
                      'Longitude':focus_area_longitudes,
                      'Restaurants nearby':restaurants_nearby,
                      'Distance next Indian Restaurant':distance_next_indian,
                      'Distance to Center':distance_center_manhattan})
grid_df.head()
Out[69]:
Latitude Longitude Restaurants nearby Distance next Indian Restaurant Distance to Center
0 40.756049 -73.993485 18 143.0 7218.0
1 40.757249 -73.993485 15 69.0 7056.0
2 40.758449 -73.993485 15 187.0 6897.0
3 40.759649 -73.993485 24 234.0 6740.0
4 40.760849 -73.993485 14 298.0 6585.0

lets filter the restaurants. We are interested in locations with no restaurant within a radius of 250m and no Indian restaurant in a radius of 500m.

In [72]:
good_res_count = np.array((grid_df['Restaurants nearby']<=0))
print('Locations with no more than two restaurants nearby:', good_res_count.sum())
good_ind_distance = np.array(grid_df['Distance next Indian Restaurant']>=500)
print('Locations with no Indian restaurants within 500m:', good_ind_distance.sum())
good_locations = np.logical_and(good_res_count, good_ind_distance)
print('Locations with both conditions met:', good_locations.sum())

df_good_locations = grid_df[good_locations]
df_good_locations.head()
Locations with no more than two restaurants nearby: 417
Locations with no Indian restaurants within 500m: 406
Locations with both conditions met: 372
Out[72]:
Latitude Longitude Restaurants nearby Distance next Indian Restaurant Distance to Center
7 40.764449 -73.993485 0 767.0 6139.0
8 40.765649 -73.993485 0 959.0 5996.0
9 40.766849 -73.993485 0 1155.0 5858.0
10 40.768049 -73.993485 0 1353.0 5723.0
11 40.769249 -73.993485 0 1553.0 5592.0

Lets visualize the grid points with no restaurant within 250m and no Indian restaurant within 500m.

In [74]:
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=13)
folium.Marker(center_manhattan).add_to(map_manhattan)
for lat, lng, in zip(df_good_locations['Latitude'], df_good_locations['Longitude']):
    folium.CircleMarker(
        [lat, lng],
        radius=1,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
HeatMap(restaurant_latlons).add_to(map_manhattan)
HeatMap(indian_restaurant_latlons).add_to(map_manhattan)
folium.GeoJson(manhattan_neighborhoods, style_function=boroughs_style, name='geojson').add_to(map_manhattan)
map_manhattan
Out[74]:

Looks good. The remaining grid cells are perfectly matching inbetween the Heatmap of restaurants and indian restaurants.

Lets visualize a heatmap of the good locations that are matching the criteria of no restaurant in a distance of 250m and no Indian restaurant within an radius of 500m.

In [75]:
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=13)
folium.Marker(center_manhattan).add_to(map_manhattan)
HeatMap(pd.DataFrame({'Latitude':df_good_locations['Latitude'],
                      'Longitude':df_good_locations['Longitude']})).add_to(map_manhattan)
folium.GeoJson(manhattan_neighborhoods, style_function=boroughs_style, name='geojson').add_to(map_manhattan)
map_manhattan
Out[75]:

The map represents the final result. It visualizes all the promissing areas close to the center of Manhattan to open up a Indian restaurant.
Be aware that the part of the heatmap overlaping with the central park needs to be ignored cause there it is obviously not possible to open up a restaurant.

Results and Discussion

The analysis shows some areas close to the center of Manhattan where the density of restaurants/Indian restaurants is low even if you can find nearly 3000 restaurants in Manhattan.

The analysis presents two areas where you wont find any Indian restaurant within at least 500m radius and where there are no restaurants in at least 250m of radius.

From a perspective of competition the analysis is able to present two queit lage areas where it might be interesting to open up a Indian restaurant but it doen´t take into account if the rent is affordable or if there are spaces available to open up a restaurant or if it is a attractive neighborhood.

Conclusion

The purpose of this analysis was to present attractive locations to the stakeholders to open up a Indian restaurant in Manhattan.

Therefore the analysis used data science to calculate the density of restaurants/Indian restaurants. By visualizing those densities we were able to identify two areas quiet close to the center of Manhattan where the density of restaurants/Indian restaurants is very low.

This analyis will build the foundation for stakeholders for making a descision where to open up a Indian restaurant. For the descision additional factors needs to be taken into account like for examble the rent, if there are available locations for a restaurant, the population density and the overall attractiveness of the neighborhood.